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Rise of the GenAl 
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ChatGPT was launched in November 2022, less than a year ago 
2 talks at ApacheCon 2022: vector search only, not Al 

16 talks including 2 keynotes at Community over Code 2023 
OpenAl valuation at $29Bn, aiming $90Bn 
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The training phase costs $$$: 
e Gathering, cleaning, storing, processing data 
e Highly qualified, hard to find ML engineers / data scientists required 
e Long process / TTM 
e Not reusable, need 1 model per use-case 
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) Lowering the barrier to Al 


GPT (Large Language Model) 


Next 
word/token 
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) Lowering the barrier to Al 


GPT (Large Language Model) 
Training phase I Inference phase 


Public Internet : i 
ML engineering Apply model 


As of Costs $$$$ but done Generic Large Language 
September only once by a SaaS Model, reusable, tailored 
2021 company (eg. OpenAI) for human language 
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Using GenAl is just about building a prompt 


e The problem to solve is not anymore in the model, it's in 
the prompt. 


- e The quality of the description of the problem directly 
P rom pti N g impacts the accuracy of the result. 


e The LLM is static: 


o The live data must be included into the 
prompt 


o Information not present in the LLM training 
data must be included into the prompt 


e The size of the prompt is limited 
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) Anatomy of a prompt 


Problem to Solve: 


Eg: "Should we propose a reduction 
coupon to the user ?" 
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) Anatomy of a prompt 


Problem to Solve: 
Eg: "Should we propose a reduction 
coupon to the user ?" 


í . Aggregate structured data from 
Dynamic data: : ; : 
Eostüserbavment history item price multiple sources in real-time. 
9. poy! Y, price, Event-Driven Architectures are a 
item stock, ... i 
perfect match for this. 
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) Anatomy of a prompt 


Problem to Solve: 
Eg: "Should we propose a reduction 
coupon to the user ?" 


Dynamic data: 


Eg: "user payment history, item price, 
item stock, ...” 


Data to solve the problem that was not 
scraped during the training or that we 
want to give more focus on. 

For instance, new data after September 
2021 or private data. 


Domain specific data (RAG): 
Eg: "rules to propose a reduction" 
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) Anatomy of a prompt 


Problem to Solve: 
Eg: "Should we propose a reduction 
coupon to the user ?" 


Limited size. 
For OpenAT's 
gpt-3.5-turbo: 4097 tokens Dynamic data: 

(token ~= word) Eg: “user payment history, item price, 
For OpenAT's item stock, ...” 
gpt-3.5-turbo-16k: 16385 
tokens 


Domain specific data: 


Eg: “rules to propose a reduction” 
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) Problem: the domain 
specific data is huge and 
doesnt fit into the LLM 
prompt. 
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) Whatis a vector/embedding ? 
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v2 


v3 


X 1 


2 dimensions normalised vectors 


An embedding model transforms 
a text into a vector called an 
embedding. 

The embedding can be N 
dimensions. For instance 
OpenAls embeddings are 1536 
dimensions. 

Similarity: v1 is more similar to v2 
than v3. This is a simple 
mathematical formula. 


) Whatis a vector/embedding ? 
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cat 


dog 


house 


The vector captures the essence 
of a word or a block of text within 
its context. 

The dimensions are the result of 
the LLM training. 


Vector stores / vector databases 


e Embeddings storage (with or without metadata 
Vector search depending on the DB) 


e Built-in algoritnms for fast retrieval of so-called 
"nearest-neighbors" embeddings (eg. HNSW, 
JVector, ...) 


Vectors are a new type of data supported in 
established databases (DataStax AstraDB, 

à Cassandra, ...) and new specialized databases 
(Pinecone, Milvius, ...) 


easing characteristic radius 
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) Let's go back to the 
problem: our domain 
specific data is huge and 
doesnt fit into the LLM 
prompt. 
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) Retrieval Augmented Generation (RAG) 


Domain specific data Prompt question 


Embeddings model 


Compute embeddings (eg OpenAI) 


Compute embedding 


Store embeddings Get nearest neighbors 


Vector 
Database 


(eg Cassandra) 
Add to prompt 


LLM 


(eg OpenAl) Get response 
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Unstructured data 


Extract text (eg. from 
PDF, HTML, ,MS doc) 


Normalise (lower 
case, trim spaces) 


Detect/filter/tag 
language 


Split into chunks 
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» Generative Al features 


Structured data 
Compute fields 
Drop fields 
Flatten 

Cast to types 


Merge key and value 
fields 


Unwrap key or value 


AI operations 


Compute 
embeddings 


Store embeddings 


Perform vector 
search 


Re-rank (MMR) 


Get chat 
completions 


Source/Sink data 


Streaming system 
(Kafka, Pulsar, ...) 


Web site crawling 
Database 
S3 / block storage 


Micro-service 
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) LangStream 
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LangStream 


AI libraries 
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) Kubernetes-native Kafka-Connect 


Gunnar Morling 


Random Musings on All Things Software Engineering 


VOY ia D 


Blog Projects Conferences Podcasts About Search... Q 


An Ideation for Kubernetes-native 
Kafka Connect 


Po: 


at Sep 6, 2022 


Kafka Connect, part of the Apache Kafka project, is a development framework and runtime for 
connectors which either ingest data into Kafka clusters (source connectors) or propagate data from Kafka 
into external systems (sink connectors). A diverse ecosystem of ready-made connectors has come to life 
on top of Kafka Connect, which lets you connect all kinds of data stores, APIs, and other systems to Kafka 
in a no-code approach. 


With the continued move towards running software in the cloud and on Kubernetes in particular, it's just 
natural that many folks also try to run Kafka Connect on Kubernetes. On first thought, this should be 
simple enough: just take the Connect binary and some connector(s), put them into a container image, 
and schedule it for execution on Kubernetes. As so often, the devil is in the details though: should you 
use Connect's standalone or distributed mode? How can you control the lifecycle of specific connectors 
via the Kubernetes control plane? How to make sure different connectors don't compete unfairly on 
resources such as CPU, RAM, or network bandwidth? In the remainder of this blog post, I'd like to explore 
running Kafka Connect on Kubernetes, what some of the challenges are for doing so, and how Kafka 
Connect could potentially be reimagined to become more "Kubernetes-friendly" in the future. 
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) Composable Agents 
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Sources: 

Ó Kafka 
Pulsar 
Kafka-Connect 
Python custom 
Web crawler 
S3 


Sinks: 

Kafka 

Pulsar 
Kafka-Connect 
Python custom 
Vector DB 


Processors: 


Python custom 
Compute embeddings 
Chat completions 
Compute fields 
Drop fields 

Extract text 
Normalise text 
Detect language 
Split into chunks 
Query Vector DB 
Re-rank documents 


» Declarative low-code 


topics: 


name: "input-topic" 

creation-mode: create-if-not-exists 
name: "output-topic" 

creation-mode: create-if-not-exists 
name: "history-topic" 
creation-mode: create-if-not-exists 


pipeline: 
- name: "convert-to-json" 


type: "document-to-json" 
input: "input-topic" 
configuration: 

text-field: "question" 
name: "ai-chat-completions" 
type: "ai-chat-completions" 
output: "history-topic" 
configuration: 


Example application: 
e Reads from input-topic 
e Asks OpenAl for chat completion 
e Writes streamed answer chunks to 
output-topic 
e Writes full answers to history-topic 


model: "$(secrets.open-ai.chat-completions-model]" 


completion-field: "value.answer" 
log-field: "value.prompt" 
stream-to-topic: "output-topic" 


stream-response-completion-field: "value" 


min-chunks-per-message: 10 
messages: 
- role: user 


content: "You are a helpful assistant. Below you can find a question from the user. 
Please try to help them the best way you can.\n\n{{% value.question}}" 
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) Example application: RAG chatbot 


Pipeline 1: Crawl Web site, chunk and store 
embeddings. 

Pipeline 2: Get questions from Kafka, retrieve relevant 
chunks to answer the question, answer the question : 
with OpenAI chat completions and output results to sg | VERRE 
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Kafka. coc MEN 


Gateway: provides Web socket endpoints over the 
Kafka topics. 


Flow chart 
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) Demo 
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C) https//github.com/LangStream/langstream 
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